Pesquisa | Portal Regional da BVS

1.

Design, in silico evaluation, and in vitro verification of new bivalent Smac mimetics with pro-apoptotic activity.

Huang, Qingsheng; Peng, Yin; Peng, Yuefeng; Lin, Huijuan; Deng, Shiqi; Feng, Shengzhong; Wei, Yanjie.

Methods ; 224: 35-46, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38373678

RESUMO

Bivalent Smac mimetics have been shown to possess binding affinity and pro-apoptotic activity similar to or more potent than that of native Smac, a protein dimer able to neutralize the anti-apoptotic activity of an inhibitor of caspase enzymes, XIAP, which endows cancer cells with resistance to anticancer drugs. We design five new bivalent Smac mimetics, which are formed by various linkers tethering two diazabicyclic cores being the IAP binding motifs. We built in silico models of the five mimetics by the TwistDock workflow and evaluated their conformational tendency, which suggests that compound 3, whose linker is n-hexylene, possess the highest binding potency among the five. After synthesis of these compounds, their ability in tumour cell growth inhibition and apoptosis induction displayed in experiments with SK-OV-3 and MDA-MB-231 cancer cell lines confirms our prediction. Among the five mimetics, compound 3 displays promising pro-apoptotic activity and deserves further optimization.

Assuntos

Antineoplásicos , Neoplasias , Humanos , Proteínas Inibidoras de Apoptose/metabolismo , Proteínas Inibidoras de Apoptose/farmacologia , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/metabolismo , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/farmacologia , Antineoplásicos/farmacologia , Antineoplásicos/química , Conformação Molecular , Apoptose , Linhagem Celular Tumoral

2.

Corrigendum: Identification of circRNA biomarker for gastric cancer through integrated analysis.

Hossain, Md Tofazzal; Li, Song; Reza, Md Selim; Feng, Shengzhong; Zhang, Xiaojing; Jin, Zhe; Wei, Yanjie; Peng, Yin.

Front Mol Biosci ; 10: 1249019, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37469706

RESUMO

[This corrects the article DOI: 10.3389/fmolb.2022.857320.].

3.

Optimize data-driven multi-agent simulation for COVID-19 transmission.

Jin, Chao; Zhang, Hao; Yin, Ling; Zhang, Yong; Feng, Sheng-Zhong.

BMC Bioinformatics ; 23(1): 260, 2022 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-35778688

RESUMO

BACKGROUND: Multi-Agent Simulation is an essential technique for exploring complex systems. In research of contagious diseases, it is widely exploited to analyze their spread mechanisms, especially for preventing COVID-19. Nowadays, transmission dynamics and interventions of COVID-19 have been elaborately established by this method, but its computation performance is seldomly concerned. As it usually suffers from inadequate CPU utilization and poor data locality, optimizing the performance is challenging and important for real-time analyzing its spreading. RESULTS: This paper explores approaches to optimize multi-agent simulation for COVID-19 disease. The focus of this work is on the algorithm and data structure designs for improving performance, as well as its parallelization strategies. We propose two successive methods to optimize the computation. We construct a case-focused iteration algorithm to improve data locality, and propose a fast data-mapping scheme called hierarchical hash table to accelerate hash operations. As a result, The case-focused method degrades [Formula: see text] cache references and achieves [Formula: see text] speedup. Hierarchical hash table can further boost computation speed by 47%. And parallel implementation with 20 threads on CPU achieves [Formula: see text] speedup consequently. CONCLUSIONS: In this work, we propose optimizations for multi-agent simulation of COVID-19 transmission from aspects of algorithm and data structure. Benefit from improvement of locality and multi-thread implementation, our methods can significantly accelerate the simulation computation. It is promising in supporting real-time prevention of COVID-19 and other infectious diseases in the future.

Assuntos

COVID-19 , Algoritmos , Simulação por Computador , Humanos , Software

4.

Reconstruction of Full-Length circRNA Sequences Using Chimeric Alignment Information.

Hossain, Md Tofazzal; Zhang, Jingjing; Reza, Md Selim; Peng, Yin; Feng, Shengzhong; Wei, Yanjie.

Int J Mol Sci ; 23(12)2022 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-35743218

RESUMO

Circular RNAs (circRNAs) are RNA molecules formed by joining a downstream 3 splice donor site and an upstream 5 splice acceptor site. Several recent studies have identified circRNAs as potential biomarker for different diseases. A number of methods are available for the identification of circRNAs. The circRNA identification methods cannot provide full-length sequences. Reconstruction of the full-length sequences is crucial for the downstream analyses of circRNA research including differential expression analysis, circRNA-miRNA interaction analysis and other functional studies of the circRNAs. However, a limited number of methods are available in the literature for the reconstruction of full-length circRNA sequences. We developed a new method, circRNA-full, for full-length circRNA sequence reconstruction utilizing chimeric alignment information from the STAR aligner. To evaluate our method, we used full-length circRNA sequences produced by isocirc and ciri-long using long-reads RNA-seq data. Our method achieved better reconstruction rate, precision, sensitivity and F1 score than the existing full-length circRNA sequence reconstruction tool ciri-full for both human and mouse data.

Assuntos

Sítios de Splice de RNA , RNA Circular , Animais , Camundongos , RNA/genética , RNA/metabolismo , RNA Circular/genética , RNA-Seq

5.

Combinational Recommendation of Vaccinations, Mask-Wearing, and Home-Quarantine to Control Influenza in Megacities: An Agent-Based Modeling Study With Large-Scale Trajectory Data.

Zhang, Hao; Yin, Ling; Mao, Liang; Mei, Shujiang; Chen, Tianmu; Liu, Kang; Feng, Shengzhong.

Front Public Health ; 10: 883624, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35719665

RESUMO

The outbreak of COVID-19 stimulated a new round of discussion on how to deal with respiratory infectious diseases. Influenza viruses have led to several pandemics worldwide. The spatiotemporal characteristics of influenza transmission in modern cities, especially megacities, are not well-known, which increases the difficulty of influenza prevention and control for populous urban areas. For a long time, influenza prevention and control measures have focused on vaccination of the elderly and children, and school closure. Since the outbreak of COVID-19, the public's awareness of measures such as vaccinations, mask-wearing, and home-quarantine has generally increased in some regions of the world. To control the influenza epidemic and reduce the proportion of infected people with high mortality, the combination of these three measures needs quantitative evaluation based on the spatiotemporal transmission characteristics of influenza in megacities. Given that the agent-based model with both demographic attributes and fine-grained mobility is a key planning tool in deploying intervention strategies, this study proposes a spatially explicit agent-based influenza model for assessing and recommending the combinations of influenza control measures. This study considers Shenzhen city, China as the research area. First, a spatially explicit agent-based influenza transmission model was developed by integrating large-scale individual trajectory data and human response behavior. Then, the model was evaluated across multiple intra-urban spatial scales based on confirmed influenza cases. Finally, the model was used to evaluate the combined effects of the three interventions (V: vaccinations, M: mask-wearing, and Q: home-quarantining) under different compliance rates, and their optimal combinations for given control objectives were recommended. This study reveals that adults were a high-risk population with a low reporting rate, and children formed the lowest infected proportion and had the highest reporting rate in Shenzhen. In addition, this study systematically recommended different combinations of vaccinations, mask-wearing, and home-quarantine with different compliance rates for different control objectives to deal with the influenza epidemic. For example, the "V45%-M60%-Q20%" strategy can maintain the infection percentage below 5%, while the "V20%-M60%-Q20%" strategy can maintain the infection percentage below 15%. The model and policy recommendations from this study provide a tool and intervention reference for influenza epidemic management in the post-COVID-19 era.

Assuntos

COVID-19 , Influenza Humana , Adulto , Idoso , COVID-19/prevenção & controle , Criança , Cidades , Humanos , Influenza Humana/epidemiologia , Influenza Humana/prevenção & controle , Pandemias/prevenção & controle , Quarentena , SARS-CoV-2 , Análise de Sistemas , Vacinação

6.

Bioinformatics Screening of Potential Biomarkers from mRNA Expression Profiles to Discover Drug Targets and Agents for Cervical Cancer.

Reza, Md Selim; Harun-Or-Roshid, Md; Islam, Md Ariful; Hossen, Md Alim; Hossain, Md Tofazzal; Feng, Shengzhong; Xi, Wenhui; Mollah, Md Nurul Haque; Wei, Yanjie.

Int J Mol Sci ; 23(7)2022 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-35409328

RESUMO

Bioinformatics analysis has been playing a vital role in identifying potential genomic biomarkers more accurately from an enormous number of candidates by reducing time and cost compared to the wet-lab-based experimental procedures for disease diagnosis, prognosis, and therapies. Cervical cancer (CC) is one of the most malignant diseases seen in women worldwide. This study aimed at identifying potential key genes (KGs), highlighting their functions, signaling pathways, and candidate drugs for CC diagnosis and targeting therapies. Four publicly available microarray datasets of CC were analyzed for identifying differentially expressed genes (DEGs) by the LIMMA approach through GEO2R online tool. We identified 116 common DEGs (cDEGs) that were utilized to identify seven KGs (AURKA, BRCA1, CCNB1, CDK1, MCM2, NCAPG2, and TOP2A) by the protein-protein interaction (PPI) network analysis. The GO functional and KEGG pathway enrichment analyses of KGs revealed some important functions and signaling pathways that were significantly associated with CC infections. The interaction network analysis identified four TFs proteins and two miRNAs as the key transcriptional and post-transcriptional regulators of KGs. Considering seven KGs-based proteins, four key TFs proteins, and already published top-ranked seven KGs-based proteins (where five KGs were common with our proposed seven KGs) as drug target receptors, we performed their docking analysis with the 80 meta-drug agents that were already published by different reputed journals as CC drugs. We found Paclitaxel, Vinorelbine, Vincristine, Docetaxel, Everolimus, Temsirolimus, and Cabazitaxel as the top-ranked seven candidate drugs. Finally, we investigated the binding stability of the top-ranked three drugs (Paclitaxel, Vincristine, Vinorelbine) by using 100 ns MD-based MM-PBSA simulations with the three top-ranked proposed receptors (AURKA, CDK1, TOP2A) and observed their stable performance. Therefore, the proposed drugs might play a vital role in the treatment against CC.

Assuntos

Biologia Computacional , Neoplasias do Colo do Útero , Aurora Quinase A/genética , Biomarcadores Tumorais/genética , Proteínas Cromossômicas não Histona/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Detecção Precoce de Câncer/métodos , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Paclitaxel , RNA Mensageiro , Neoplasias do Colo do Útero/tratamento farmacológico , Neoplasias do Colo do Útero/genética , Vincristina , Vinorelbina

7.

Identification of circRNA Biomarker for Gastric Cancer through Integrated Analysis.

Hossain, Md Tofazzal; Li, Song; Reza, Md Selim; Feng, Shengzhong; Zhang, Xiaojing; Jin, Zhe; Wei, Yanjie; Peng, Yin.

Front Mol Biosci ; 9: 857320, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35359600

RESUMO

Gastric cancer (GC) is one of the most common malignant tumors and ranks third in cancer mortality globally. Although, a lot of advancements have been made in diagnosis and treatment of gastric cancer, there is still lack of ideal biomarker for the diagnosis and treatment of gastric cancer. Due to the poor prognosis, the survival rate is not improved much. Circular RNAs (circRNAs) are single-stranded RNAs with a covalently closed loop structure that don't have the 5'-3' polarity and a 3' polyA tail. Because of their circular structure, circRNAs are more stable than linear RNAs. Previous studies have found that circRNAs are involved in several biological processes like cell cycle, proliferation, apoptosis, autophagy, migration and invasion in different cancers, and participate in some molecular mechanisms including sponging microRNAs (miRNAs), protein translation and binding to RNA-binding proteins. Several studies have reported that circRNAs play crucial role in the occurrence and development of different types of cancers. Although, some studies have reported several circRNAs in gastric cancer, more studies are needed in searching new biomarkers for gastric cancer diagnosis and treatment. Here, we investigated potential circRNA biomarkers for GC using next-generation sequencing (NGS) data collected from 5 paired GC samples. A total of 45,783 circRNAs were identified in all samples and among them 478 were differentially expressed (DE). The gene ontology (GO) analysis of the host genes of the DE circRNAs showed that some genes were enriched in several important biological processes, molecular functions and cellular components. The Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis revealed that some host genes were enriched in several GC related pathways. The circRNA-miRNA-gene interaction network analysis showed that two circRNAs circCEACAM5 and circCOL1A1 were interacted with gastric cancer related miRNAs, and their host genes were also the important therapeutic and prognostic biomarkers for GC. The experimental results also validated that these two circRNAs were DE in GC compared to adjacent normal tissues. Overall, our findings suggest that these two circRNAs circCEACAM5 and circCOL1A1 might be the potential biomarkers for the diagnosis and treatment of GC.

8.

Boosting the predictive performance with aqueous solubility dataset curation.

Meng, Jintao; Chen, Peng; Wahib, Mohamed; Yang, Mingjun; Zheng, Liangzhen; Wei, Yanjie; Feng, Shengzhong; Liu, Wei.

Sci Data ; 9(1): 71, 2022 03 03.

Artigo em Inglês | MEDLINE | ID: mdl-35241693

RESUMO

Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage.

9.

Identification of Potential Long Non-Coding RNA Candidates that Contribute to Triple-Negative Breast Cancer in Humans through Computational Approach.

Rahman, Md Motiar; Hossain, Md Tofazzal; Reza, Md Selim; Peng, Yin; Feng, Shengzhong; Wei, Yanjie.

Int J Mol Sci ; 22(22)2021 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-34830241

RESUMO

Breast cancer (BC) is the most frequent malignancy identified in adult females, resulting in enormous financial losses worldwide. Owing to the heterogeneity as well as various molecular subtypes, the molecular pathways underlying carcinogenesis in various forms of BC are distinct. Therefore, the advancement of alternative therapy is required to combat the ailment. Recent analyses propose that long non-coding RNAs (lncRNAs) perform an essential function in controlling immune response, and therefore, may provide essential information about the disorder. However, their function in patients with triple-negative BC (TNBC) has not been explored in detail. Here, we analyzed the changes in the genomic expression of messenger RNA (mRNA) and lncRNA in standard control in response to cancer metastasis using publicly available single-cell RNA-Seq data. We identified a total of 197 potentially novel lncRNAs in TNBC patients of which 86 were differentially upregulated and 111 were differentially downregulated. In addition, among the 909 candidate lncRNA transcripts, 19 were significantly differentially expressed (DE) of which three were upregulated and 16 were downregulated. On the other hand, 1901 mRNA transcripts were significantly DE of which 1110 were upregulated and 791 were downregulated by TNBCs subtypes. The Gene Ontology (GO) analyses showed that some of the host genes were enriched in various biological, molecular, and cellular functions. The Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis showed that some of the genes were involved in only one pathway of prostate cancer. The lncRNA-miRNA-gene network analysis showed that the lncRNAs TCONS_00076394 and TCONS_00051377 interacted with breast cancer-related micro RNAs (miRNAs) and the host genes of these lncRNAs were also functionally related to breast cancer. Thus, this study provides novel lncRNAs as potential biomarkers for the therapeutic intervention of this cancer subtype.

Assuntos

MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Neoplásico/genética , Neoplasias de Mama Triplo Negativas/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Biologia Computacional/métodos , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Glândulas Mamárias Humanas/metabolismo , Glândulas Mamárias Humanas/patologia , MicroRNAs/classificação , MicroRNAs/metabolismo , Anotação de Sequência Molecular , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , RNA Mensageiro/classificação , RNA Mensageiro/metabolismo , RNA Neoplásico/classificação , RNA Neoplásico/metabolismo , Neoplasias de Mama Triplo Negativas/diagnóstico , Neoplasias de Mama Triplo Negativas/metabolismo , Neoplasias de Mama Triplo Negativas/patologia

10.

A data driven agent-based model that recommends non-pharmaceutical interventions to suppress Coronavirus disease 2019 resurgence in megacities.

Yin, Ling; Zhang, Hao; Li, Yuan; Liu, Kang; Chen, Tianmu; Luo, Wei; Lai, Shengjie; Li, Ye; Tang, Xiujuan; Ning, Li; Feng, Shengzhong; Wei, Yanjie; Zhao, Zhiyuan; Wen, Ying; Mao, Liang; Mei, Shujiang.

J R Soc Interface ; 18(181): 20210112, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34428950

RESUMO

Before herd immunity against Coronavirus disease 2019 (COVID-19) is achieved by mass vaccination, science-based guidelines for non-pharmaceutical interventions are urgently needed to reopen megacities. This study integrated massive mobile phone tracking records, census data and building characteristics into a spatially explicit agent-based model to simulate COVID-19 spread among 11.2 million individuals living in Shenzhen City, China. After validation by local epidemiological observations, the model was used to assess the probability of COVID-19 resurgence if sporadic cases occurred in a fully reopened city. Combined scenarios of three critical non-pharmaceutical interventions (contact tracing, mask wearing and prompt testing) were assessed at various levels of public compliance. Our results show a greater than 50% chance of disease resurgence if the city reopened without contact tracing. However, tracing household contacts, in combination with mandatory mask use and prompt testing, could suppress the probability of resurgence under 5% within four weeks. If household contact tracing could be expanded to work/class group members, the COVID resurgence could be avoided if 80% of the population wear facemasks and 40% comply with prompt testing. Our assessment, including modelling for different scenarios, helps public health practitioners tailor interventions within Shenzhen City and other world megacities under a variety of suppression timelines, risk tolerance, healthcare capacity and public compliance.

Assuntos

COVID-19/prevenção & controle , Controle de Doenças Transmissíveis/métodos , Modelos Teóricos , Teste para COVID-19 , China , Cidades , Busca de Comunicante , Humanos , Imunidade Coletiva , Máscaras

11.

COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization.

Reza, Md Selim; Zhang, Huiling; Hossain, Md Tofazzal; Jin, Langxi; Feng, Shengzhong; Wei, Yanjie.

Membranes (Basel) ; 11(7)2021 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-34209399

RESUMO

Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein's function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue-residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant É-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.

12.

FcircSEC: An R Package for Full Length circRNA Sequence Extraction and Classification.

Hossain, Md Tofazzal; Peng, Yin; Feng, Shengzhong; Wei, Yanjie.

Int J Genomics ; 2020: 9084901, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32566642

RESUMO

Circular RNAs (circRNAs) are formed by joining the 3' and 5' ends of RNA molecules. Identification of circRNAs is an important part of circRNA research. The circRNA prediction methods can predict the circRNAs with start and end positions in the chromosome but cannot identify the full-length circRNA sequences. We present an R package FcircSEC (Full Length circRNA Sequence Extraction and Classification) to extract the full-length circRNA sequences based on gene annotation and the output of any circRNA prediction tools whose output has a chromosome, start and end positions, and a strand for each circRNA. To validate FcircSEC, we have used three databases, circbase, circRNAdb, and plantcircbase. With information such as the chromosome and strand of each circRNA as the input, the identified sequences by FcircSEC are consistent with the databases. The novelty of FcircSEC is that it can take the output of state-of-the-art circRNA prediction tools as input and is applicable for human and other species. We also classify the circRNAs as exonic, intronic, and others. The R package FcircSEC is freely available.

13.

Counting Kmers for Biological Sequences at Large Scale.

Ge, Jianqiu; Meng, Jintao; Guo, Ning; Wei, Yanjie; Balaji, Pavan; Feng, Shengzhong.

Interdiscip Sci ; 12(1): 99-108, 2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-31734873

RESUMO

Counting the abundance of all the distinct kmers in biological sequence data is a fundamental step in bioinformatics. These applications include de novo genome assembly, error correction, etc. With the development of sequencing technology, the sequence data in a single project can reach Petabyte-scale or Terabyte-scale nucleotides. Counting demand for the abundance of these sequencing data is beyond the memory and computing capacity of single computing node, and how to process it efficiently is a challenge on a high-performance computing cluster. As such, we propose SWAPCounter, a highly scalable distributed approach for kmer counting. This approach is embedded with an MPI streaming I/O module for loading huge data set at high speed, and a counting bloom filter module for both memory and communication efficiency. By overlapping all the counting steps, SWAPCounter achieves high scalability with high parallel efficiency. The experimental results indicate that SWAPCounter has competitive performance with two other tools on shared memory environment, KMC2, and MSPKmerCounter. Moreover, SWAPCounter also shows the highest scalability under strong scaling experiments. In our experiment on Cetus supercomputer, SWAPCounter scales to 32,768 cores with 79% parallel efficiency (using 2048 cores as baseline) when processing 4 TB sequence data of 1000 Genomes. The source code of SWAPCounter is publicly available at https://github.com/mengjintao/SWAPCounter.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software

14.

The TwistDock workflow for evaluation of bivalent Smac mimetics targeting XIAP.

Huang, Qingsheng; Peng, Yin; Peng, Yuefeng; Wei, Dan; Wei, Yanjie; Feng, Shengzhong.

Drug Des Devel Ther ; 13: 1373-1388, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31118573

RESUMO

Purpose: Mimetics based on Smac, the native inhibitor of XIAP, are promising drug-candidates for the treatment of cancer. Bivalent Smac mimetics inhibit XIAP with even higher potency than monovalent mimetics, but how to optimize the linker that tethers the two monovalent binding motifs remains controversial. Methods: To construct an ensemble of bivalent complex structures for evaluating various linkers, we propose herein a workflow, named TwistDock, consisting of steps of monovalent docking and linker twisting, in which the degrees of freedom are sampled focusing on the rotation of single bonds of the linker. Results: The obtained conformations of bivalent complex distribute randomly in the conformational space with respect to two reaction coordinates introduced by the linker, which are the distance of the two binding motifs and the dihedral angle of the two planes through the linker and each of the binding motifs. Molecular dynamics starting from 10 conformations with the lowest enthalpy of every complex shows that the conformational tendency of the complex participated by compound 9, one of the compounds with the largest binding affinity, is distinct from others. By umbrella sampling of the complex, we find its global minimum of the free energy landscape. The structure shows that the linker favors a compact conformation, and the two BIR domains of XIAP encompass the ligand on the opposite sides. Conclusion: TwistDock can be used in fine-tuning of bivalent ligands targeting XIAP and similar receptors dimerized or oligomerized.

Assuntos

Materiais Biomiméticos/farmacologia , Oligopeptídeos/farmacologia , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/antagonistas & inibidores , Proteína 3 com Repetições IAP de Baculovírus/antagonistas & inibidores , Proteína 3 com Repetições IAP de Baculovírus/metabolismo , Materiais Biomiméticos/química , Humanos , Proteínas Inibidoras de Apoptose/antagonistas & inibidores , Proteínas Inibidoras de Apoptose/metabolismo , Ligantes , Modelos Moleculares , Conformação Molecular , Oligopeptídeos/química , Ubiquitina-Proteína Ligases/antagonistas & inibidores , Ubiquitina-Proteína Ligases/metabolismo , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/metabolismo

15.

WLDISR: Weighted Local Sparse Representation-Based Depth Image Super-Resolution for 3D Video System.

Zhang, Huan; Zhang, Yun; Wang, Hanli; Ho, Yo-Sung; Feng, Shengzhong.

IEEE Trans Image Process ; 28(2): 561-576, 2019 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30136946

RESUMO

In this paper, we propose a Weighted Local sparse representation based Depth Image Super-Resolution (WLDISR) schemes aiming at improving the Virtual View Image (VVI) quality of 3D video system. Different from color images, depth images are mainly used to provide geometrical information in synthesizing VVI. Due to the view synthesis characteristics difference between textural structures and smooth regions of depth images, we divide the depth images into edge and smooth patches and learn two local dictionaries, respectively. Meanwhile, the weight term is derived and incorporated explicitly in the cost function to denote different importance of edge structures and smooth regions to the VVI quality. Then, local sparse representation and weighted sparse representation are jointly used in both dictionary learning and reconstruction phases in depth image super-resolution. Based on different optimizations on learning and reconstruction modules, three WLDISR schemes, WLDISR-D, WLDISR-R, and WLDISR-ALL, are proposed. Experimental results on 3D sequences demonstrate that the proposed WLDISR-D, WLDISR-R, and WLDISR-ALL schemes can achieve more than 1.9-, 2.03-, and 2.16-dB gains on average, respectively, in terms of the VVIs' quality, as compared with the state-of-the-art schemes. In addition, the visual quality of VVIs is also improved.

16.

Ensemble Methods with Voting Protocols Exhibit Superior Performance for Predicting Cancer Clinical Endpoints and Providing More Complete Coverage of Disease-Related Genes.

Jing, Runyu; Liang, Yu; Ran, Yi; Feng, Shengzhong; Wei, Yanjie; He, Li.

Int J Genomics ; 2018: 8124950, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29546047

RESUMO

In genetic data modeling, the use of a limited number of samples for modeling and predicting, especially well below the attribute number, is difficult due to the enormous number of genes detected by a sequencing platform. In addition, many studies commonly use machine learning methods to evaluate genetic datasets to identify potential disease-related genes and drug targets, but to the best of our knowledge, the information associated with the selected gene set was not thoroughly elucidated in previous studies. To identify a relatively stable scheme for modeling limited samples in the gene datasets and reveal the information that they contain, the present study first evaluated the performance of a series of modeling approaches for predicting clinical endpoints of cancer and later integrated the results using various voting protocols. As a result, we proposed a relatively stable scheme that used a set of methods with an ensemble algorithm. Our findings indicated that the ensemble methodologies are more reliable for predicting cancer prognoses than single machine learning algorithms as well as for gene function evaluating. The ensemble methodologies provide a more complete coverage of relevant genes, which can facilitate the exploration of cancer mechanisms and the identification of potential drug targets.

17.

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.

Meng, Jintao; Wang, Bingqiang; Wei, Yanjie; Feng, Shengzhong; Balaji, Pavan.

BMC Bioinformatics ; 15 Suppl 9: S2, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25253533

RESUMO

BACKGROUND: There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. RESULTS: This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. CONCLUSIONS: In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler.

Assuntos

Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Genoma , Humanos

18.

Post-processing strategies for improving local gene expression pattern analysis.

Wang, Qiang; Ye, Yunming; Huang, Joshua Zhexue; Feng, Shengzhong.

Int J Data Min Bioinform ; 7(1): 1-21, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23437512

RESUMO

This paper proposes a new analytical process highlighted by a soft subspace clustering method, a changing window technique, and a series of post-processing strategies to enhance the identification and characterisation of local gene expression patterns. The proposed method can be conducted in an interactive way, facilitating the exploration and analysis of local gene expression patterns in real applications. Experimental results have shown that the proposed method is effective in identification and characterization of functional gene groups in terms of both local expression similarities and biological coherence of genes in a cluster.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos/métodos

19.

A fast and flexible approach to oligonucleotide probe design for genomes and gene families.

Feng, Shengzhong; Tillier, Elisabeth R M.

Bioinformatics ; 23(10): 1195-202, 2007 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-17392329

RESUMO

MOTIVATION: With hundreds of completely sequenced microbial genomes available, and advancements in DNA microarray technology, the detection of genes in microbial communities consisting of hundreds of thousands of sequences may be possible. The existing strategies developed for DNA probe design, geared toward identifying specific sequences, are not suitable due to the lack of coverage, flexibility and efficiency necessary for applications in metagenomics. METHODS: ProDesign is a tool developed for the selection of oligonucleotide probes to detect members of gene families present in environmental samples. Gene family-specific probe sequences are generated based on specific and shared words, which are found with the spaced seed hashing algorithm. To detect more sequences, those sharing some common words are re-clustered into new families, then probes specific for the new families are generated. RESULTS: The program is very flexible in that it can be used for designing probes for detecting many genes families simultaneously and specifically in one or more genomes. Neither the length nor the melting temperature of the probes needs to be predefined. We have found that ProDesign provides more flexibility, coverage and speed than other software programs used in the selection of probes for genomic and gene family arrays. AVAILABILITY: ProDesign is licensed free of charge to academic users. ProDesign and Supplementary Material can be obtained by contacting the authors. A web server for ProDesign is available at http://www.uhnresearch.ca/labs/tillier/ProDesign/ProDesign.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Biologia Computacional/métodos , Família Multigênica , Sondas de Oligonucleotídeos/genética , Bactérias/genética , Genoma Bacteriano , Análise em Microsséries , Análise de Sequência com Séries de Oligonucleotídeos , Software

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA